Skip to content

Conversation

@ARR4N
Copy link
Contributor

@ARR4N ARR4N commented Nov 19, 2025

If snapshot.diskLayer.generate() completes early enough it set its genMarker field to nil and races with the diskLayer.stopGeneration() method, possibly leaking the generate() goroutine as it never receives on the genAbort channel. In practice this only means that Disable() would leak the goroutine, and that any test with goleak.Verify*() needs to ignore it. A cursory read suggests that Journal() will successfully release resources.

I've so far only demonstrated this in the test but not fixed it. Before doing so, I wanted to see if anyone thought it was worth it. For me it's only a minor annoyance in always having to ignore the method when checking for my own leaks.

@ARR4N ARR4N changed the title fix(snapshot): race resulting in generate() goroutine leak snapshot: race resulting in generate() goroutine leak Nov 19, 2025
@ARR4N ARR4N changed the title snapshot: race resulting in generate() goroutine leak core/state/snapshot: race resulting in generate() goroutine leak Nov 19, 2025
@ARR4N
Copy link
Contributor Author

ARR4N commented Nov 19, 2025

@rjl493456442 I'm gonna leave this in draft as it's only the test demonstration for now; please let me know if you think it's worth addressing.

@willowhaven2219
Copy link

We can break down the situation and see if it’s “worth fixing” versus just ignoring in tests.

⚙️ What’s happening

  • snapshot.diskLayer.generate() can complete “too early,” setting genMarker = nil.
  • Meanwhile, diskLayer.stopGeneration() may race with that state, leaving the generate() goroutine blocked forever since it never receives on genAbort.
  • The practical effect:
    • Disable() leaks the goroutine.

    • Journal() appears to clean up correctly, so normal usage doesn’t accumulate leaks.

    • Tests using goleak.Verify*() will consistently flag this unless you explicitly ignore it.


📌important on Why it matters

  • Production impact: If Journal() is the usual cleanup path, then in real-world usage resources are released and the leak doesn’t accumulate. That makes it low severity.
  • Testing impact: You’ll always have to add an ignore for Disable(). This is annoying but predictable.
  • Code hygiene: Goroutine leaks—even benign ones—are usually worth fixing because they signal a race or incomplete lifecycle management. Even if harmless now, they can become problematic if code evolves.

🛠️ Options

  1. Fix the race

    • Ensure stopGeneration() always signals genAbort regardless of genMarker state.
    • Or restructure so generate() checks for completion before blocking.
    • This removes the need for ignores in tests and makes lifecycle management airtight.
  2. Document + Ignore

    • Accept that Disable() leaks, document it clearly, and add goleak.IgnoreCurrent() or similar in tests.
    • This is pragmatic if the leak is provably harmless and fixing it would add complexity or risk.

Recommendation
If you’re already touching this code, fixing it is cleaner. Goroutine leaks are one of those “small now, big later” issues—especially in libraries where users may call Disable() in unexpected ways. Even if it’s only a test annoyance today, tightening the lifecycle avoids future surprises.

That said, if the code is stable, rarely touched, and the fix would be invasive, documenting and ignoring in tests is defensible. It depends on whether you want to prioritize developer ergonomics (no ignores, clean tests) or minimal churn (leave as-is).


👉I'm Curious: do you see Disable() being used in production paths, or is it really just a test helper for you? If it’s test-only, ignoring might be fine. If it’s exposed to users, I’d lean toward fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants